from IPython.display import Image
Image(url='https://unsplash.com/photos/-ca8YHMKxFU/download?ixid=M3wxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjg5OTgwOTgyfA&force=true&w=1920')
This data analysis project provides a comprehensive exploration of the Canadian economy's recovery following the COVID-19 pandemic, from the initial impact in Q2 2020 through to Q2 2023. Utilizing an extensive dataset that encompasses a variety of industries, the analysis investigates the fluctuating economic health of each sector, highlighting areas of growth and decline.
Key areas of focus include the identification of industries that experienced the best and worst growth rates between Q2 2019 and Q2 2023. The study further compares these figures to performance metrics during the first wave of the pandemic in Q2 2020, providing a nuanced understanding of industry resilience and adaptability in the face of unprecedented challenges.
Furthermore, the project provides insights into the recovery journey of the industries that were most profoundly affected during the initial stages of the pandemic. Through an in-depth investigation of these sectors, this project seeks to evaluate whether they have been able to regain their footing and adapt to the 'new normal' precipitated by the COVID-19 crisis.
This portfolio project offers valuable insights for individuals looking to understand the economic ramifications of the COVID-19 pandemic, providing a thorough and detailed analysis of recovery trends across the diverse landscape of Canadian industries.
Let's start by importing the required libraries for this project:
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.sandbox.stats.multicomp import multipletests
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%config InlineBackend.figure_format = 'retina'
📈 We will be using the GDP by industry dataset from Statistics Canada released on 2023-06-30: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3610043402
gdp = pd.read_csv('../data/36100434_gdp_by_industry_monthly.csv', usecols = ['REF_DATE', 'GEO', 'Seasonal adjustment', 'Prices', 'North American Industry Classification System (NAICS)', 'VALUE'])
🧐 We chose the following columns from this dataset:
1. REF_DATE: The month in which the data was collected;
2. GEO: The geographic region of the data. We are only interested in country-wide measurements (Canada);
3. Seasonal adjustment: The type of adjusment. Either "Seasonally adjusted at annual rates" or "Trading-day adjusted";
4. North American Industry Classification System (NAICS): The industry name and NAICS code from which the value was obtained from;
5. VALUE: The GDP measure in Canadian dollars (x1,000,000).
gdp.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 272076 entries, 0 to 272075 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 REF_DATE 272076 non-null object 1 GEO 272076 non-null object 2 Seasonal adjustment 272076 non-null object 3 Prices 272076 non-null object 4 North American Industry Classification System (NAICS) 272076 non-null object 5 VALUE 237396 non-null float64 dtypes: float64(1), object(5) memory usage: 12.5+ MB
👉 We can optimize our dataset by changing the REF_DATE datatype to date/time and GEO, Seasonal adjustment, Prices, NAICS to category.
gdp['REF_DATE'] = pd.to_datetime(gdp['REF_DATE'])
gdp = gdp.astype({'GEO' : 'category',
'Seasonal adjustment' : 'category',
'Prices' : 'category',
'North American Industry Classification System (NAICS)': 'category'})
gdp.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 272076 entries, 0 to 272075 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 REF_DATE 272076 non-null datetime64[ns] 1 GEO 272076 non-null category 2 Seasonal adjustment 272076 non-null category 3 Prices 272076 non-null category 4 North American Industry Classification System (NAICS) 272076 non-null category 5 VALUE 237396 non-null float64 dtypes: category(4), datetime64[ns](1), float64(1) memory usage: 5.5 MB
👉 We are mostly interested in long-term trends and thus we will use Seasonally adjusted prices constant prices (in millions of dollars)
gdp = gdp.query('`Seasonal adjustment` == "Seasonally adjusted at annual rates" and Prices == "2012 constant prices"').copy()
gdp
| REF_DATE | GEO | Seasonal adjustment | Prices | North American Industry Classification System (NAICS) | VALUE | |
|---|---|---|---|---|---|---|
| 287 | 1997-01-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | All industries [T001] | 1161269.0 |
| 288 | 1997-01-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Goods-producing industries [T002] | 402792.0 |
| 289 | 1997-01-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Service-producing industries [T003] | 756654.0 |
| 290 | 1997-01-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Business sector industries [T004] | NaN |
| 291 | 1997-01-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Business sector, goods [T005] | NaN |
| ... | ... | ... | ... | ... | ... | ... |
| 271784 | 2023-04-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Defence services [9111] | 12660.0 |
| 271785 | 2023-04-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Federal government public administration (exce... | 43931.0 |
| 271786 | 2023-04-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Provincial and territorial public administrati... | 37278.0 |
| 271787 | 2023-04-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Local, municipal and regional public administr... | 46027.0 |
| 271788 | 2023-04-01 | Canada | Seasonally adjusted at annual rates | 2012 constant prices | Aboriginal public administration [914] | 5230.0 |
90692 rows × 6 columns
✅ Let's also rename a couple of columns to make our life easier:
gdp = gdp.rename(columns = {'REF_DATE' : 'date',
'North American Industry Classification System (NAICS)': 'industry',
'VALUE' : 'value'}).set_index('date')[['industry', 'value']].copy()
gdp
| industry | value | |
|---|---|---|
| date | ||
| 1997-01-01 | All industries [T001] | 1161269.0 |
| 1997-01-01 | Goods-producing industries [T002] | 402792.0 |
| 1997-01-01 | Service-producing industries [T003] | 756654.0 |
| 1997-01-01 | Business sector industries [T004] | NaN |
| 1997-01-01 | Business sector, goods [T005] | NaN |
| ... | ... | ... |
| 2023-04-01 | Defence services [9111] | 12660.0 |
| 2023-04-01 | Federal government public administration (exce... | 43931.0 |
| 2023-04-01 | Provincial and territorial public administrati... | 37278.0 |
| 2023-04-01 | Local, municipal and regional public administr... | 46027.0 |
| 2023-04-01 | Aboriginal public administration [914] | 5230.0 |
90692 rows × 2 columns
👉 Let's separate the industry code from the industry name and assign it to a new column named 'industry_code'.
new_df = gdp['industry'].str.split('[', expand=True)
gdp['industry'] = new_df[0].str.strip()
gdp['industry_code'] = new_df[1].str.replace(']', '').str.strip()
gdp = gdp[['industry', 'industry_code', 'value']]
gdp
| industry | industry_code | value | |
|---|---|---|---|
| date | |||
| 1997-01-01 | All industries | T001 | 1161269.0 |
| 1997-01-01 | Goods-producing industries | T002 | 402792.0 |
| 1997-01-01 | Service-producing industries | T003 | 756654.0 |
| 1997-01-01 | Business sector industries | T004 | NaN |
| 1997-01-01 | Business sector, goods | T005 | NaN |
| ... | ... | ... | ... |
| 2023-04-01 | Defence services | 9111 | 12660.0 |
| 2023-04-01 | Federal government public administration (exce... | 911A | 43931.0 |
| 2023-04-01 | Provincial and territorial public administration | 912 | 37278.0 |
| 2023-04-01 | Local, municipal and regional public administr... | 913 | 46027.0 |
| 2023-04-01 | Aboriginal public administration | 914 | 5230.0 |
90692 rows × 3 columns
‼️ Cannabis has been legalized in Canada in 2018 and thus will skew these results. Let's remove them from the analysis!
gdp = gdp[~gdp['industry'].str.contains('|'.join(['Cannabis', 'cannabis']))].copy()
👉 Great, we can start taking a look at some insights from this dataset.
💰 Let's use seaborn to plot the overall growth of the Canadian economy across all industries.
# Create a line plot using Seaborn
sns.lineplot(data = gdp.query('industry == "All industries"'), x = 'date', y = 'value')
# Highlight a section using axvspan() or axhspan()
plt.axvspan('2020-01-01', '2022-01-01', color='red', alpha=0.2)
# Set labels and title
plt.xlabel('Date')
plt.ylabel('GDP by industry in CAD dollars (x 1,000,000)')
# Show the plot
plt.show()
☝️ We can clearly observe the impact of the COVID-19 pandemic on the Canadian economy in the chart above.
🧐 It does seem that the overall economy has somewhat recovered since 2020, but is this true for all industries? Let's take a look!
👉 Let's first aggregate our data by quarterly periods:
gdp_grouped = gdp.groupby([gdp.index.to_period('Q'), 'industry', 'industry_code']).median()
gdp_grouped
| value | |||
|---|---|---|---|
| date | industry | industry_code | |
| 1997Q1 | Aboriginal public administration | 914 | NaN |
| Accommodation and food services | 72 | 28186.0 | |
| Accommodation services | 721 | 9371.0 | |
| Accounting, tax preparation, bookkeeping and payroll services | 5412 | NaN | |
| Activities related to credit intermediation | 5223 | NaN | |
| ... | ... | ... | ... |
| 2023Q2 | Water, sewage and other systems | 2213 | 3375.0 |
| Wholesale electronic markets, and agents and brokers | 419 | 3107.0 | |
| Wholesale trade | 41 | 106337.0 | |
| Wineries and distilleries | 3121A | 1099.0 | |
| Wood product manufacturing | 321 | 9149.0 |
28938 rows × 1 columns
👌 Let's also specify the date range in which we will be analyzing our data. I chose 2019Q2 as the baseline for two reasons:
date_range = ['2019Q2', '2023Q2']
👇 We will use a function to filter the dataset between the time ranges above and eliminate any time periods and industries with NA's:
def filter_datarange(df, date_range, index_columns, date_column, value_column):
"""
This function will filter the dataset between specific time periods and eliminate any time periods and industries with NA's.
It will then calculate the cumulative growth since the first period in the date_range
Parameters
----------
df : Panda's DataFrame
DataFrame to apply function
date_range : list or tupple
Two elements, first and last periods to analyze
index_columns : list or tupple
Column names to create index from
date_column : str
Column name containing dates
value_column: str
Column name containing values
Returns
-------
Panda's DataFrame
"""
df = df.reset_index()
df_pivot = df[(df[date_column] >= date_range[0]) & (df[date_column] <= date_range[1])].pivot(index = index_columns, columns = date_column, values = value_column).dropna(how = 'any')
df_melt = df_pivot.reset_index().melt(id_vars = index_columns, var_name = date_column, value_name = value_column).set_index(date_column).sort_index()
df_melt['cumulative_growth'] = df_melt.groupby(index_columns).transform(lambda x: (x - x.iloc[0]) / x.iloc[0] * 100)
return df_melt
gdp_clean = filter_datarange(df = gdp_grouped, date_range = date_range, index_columns = ['industry', 'industry_code'], date_column = 'date', value_column = 'value')
gdp_clean
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2019Q2 | Aboriginal public administration | 914 | 4517.0 | 0.000000 |
| 2019Q2 | Accommodation and food services | 72 | 43887.0 | 0.000000 |
| 2019Q2 | Accommodation services | 721 | 13599.0 | 0.000000 |
| 2019Q2 | Accounting, tax preparation, bookkeeping and p... | 5412 | 12954.0 | 0.000000 |
| 2019Q2 | Activities related to credit intermediation | 5223 | 3187.0 | 0.000000 |
| ... | ... | ... | ... | ... |
| 2023Q2 | Water, sewage and other systems | 2213 | 3375.0 | 3.147922 |
| 2023Q2 | Wholesale electronic markets, and agents and b... | 419 | 3107.0 | 2.541254 |
| 2023Q2 | Wholesale trade | 41 | 106337.0 | 1.487908 |
| 2023Q2 | Wineries and distilleries | 3121A | 1099.0 | 15.320042 |
| 2023Q2 | Wood product manufacturing | 321 | 9149.0 | 2.820859 |
4641 rows × 4 columns
Great, this dataset is now ready to be analyzed!
👇 Let's create a function to visualize the cumulative growth of specific industries using plotly:
def plot_line_graph(df, industry_codes, y_col, title, y_axis_title):
"""
This function will plot a line graph based on the list of industry codes provided and name it according to title.
Parameters
----------
df : Panda's DataFrame
DataFrame to plot data from
industry_codes : list or tupple
List of industry codes to filter df
y_col: str
Column to take the y-axis values from
title : str
Plot title
Returns
-------
plotly plot
"""
industry_codes = industry_codes
df.index = df.index.astype('str')
fig = px.line(df.query('industry_code in @industry_codes'), y=y_col, color='industry', markers=True, color_discrete_sequence=px.colors.qualitative.Dark24, template='plotly_white')
# Modify the layout to change the axis labels
fig.update_layout(
title_text=title,
xaxis_title='Year', # Set the x-axis label
yaxis_title=' '.join([y_axis_title, 'since', date_range[0]]),
shapes=[
dict(
type='line',
xref='x',
yref='y',
x0=date_range[0],
y0=0,
x1=date_range[1],
y1=0,
line=dict(
color='black',
width=2,
dash='dash'
)
)
]
)
fig.show()
🧐 Now let's filter our dataset to find out which industries had the highest cumulative growth since 2019Q2.
best_industries_gdp = gdp_clean.xs(date_range[1]).nlargest(n = 10, columns = 'cumulative_growth')
best_industries_gdp_list = best_industries_gdp['industry_code'].tolist()
best_industries_gdp
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2023Q2 | Data processing, hosting, and related services | 518 | 6775.0 | 64.641555 |
| 2023Q2 | Non-store retailers | 454 | 6290.0 | 50.514477 |
| 2023Q2 | Tobacco manufacturing | 3122 | 1354.0 | 43.736730 |
| 2023Q2 | Computer systems design and related services | 5415 | 45908.0 | 37.944712 |
| 2023Q2 | Computer and peripheral equipment manufacturing | 3341 | 259.0 | 35.602094 |
| 2023Q2 | Industrial machinery manufacturing | 3332 | 2800.0 | 35.200386 |
| 2023Q2 | Engine, turbine and power transmission equipme... | 3336 | 677.0 | 31.456311 |
| 2023Q2 | Warehousing and storage | 493 | 4998.0 | 28.351310 |
| 2023Q2 | Pesticide, fertilizer and other agricultural c... | 3253 | 2915.0 | 28.244611 |
| 2023Q2 | Ventilation, heating, air-conditioning and com... | 3334 | 2453.0 | 26.248070 |
plot_line_graph(df = gdp_clean, industry_codes = best_industries_gdp_list, y_col = 'cumulative_growth', title = ''.join(['Industries with the highest cumulative growth since ', date_range[0]]), y_axis_title = 'Cumulative growth (%)')
🧐 It looks like "Data processing, hosting, and related services" has been the industry with the highest growth in Canada since 2019Q2!
👇 In fact, we can visualize how massive this growth has been when compared to the industry average below:
def plot_line_industries(df, industry_names):
"""
This function will plot a line graph based on the list of industry names provided.
Parameters
----------
df : Panda's DataFrame
DataFrame to plot data from
industry_names : list or tupple
List of industry names to filter df
Returns
-------
seaborn plot
"""
sns.relplot(data = df.query('industry in @industry_names'), x = 'date', y = 'cumulative_growth', hue = 'industry', kind = 'line')
plt.xticks(df.index[gdp_clean.index.str.contains('Q2')].unique().to_list())
# Set labels and title
plt.xlabel('Date')
plt.ylabel('Cumulative growth (%) since '+date_range[0])
#Set threshold
plt.axhline(0, color='black', linestyle='--')
# Show the plot
plt.show()
plot_line_industries(df = gdp_clean, industry_names = ["All industries", "Data processing, hosting, and related services"])
👉 Let's repeat the analysis and look at the industries with the lowest growth since 2019Q2:
worst_industries_gdp = gdp_clean.xs(date_range[1]).nsmallest(n = 10, columns = 'cumulative_growth')
worst_industries_gdp_list = worst_industries_gdp['industry_code'].tolist()
worst_industries_gdp
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2023Q2 | Management of companies and enterprises | 55 | 1504.0 | -72.993356 |
| 2023Q2 | Other non-metallic mineral mining and quarryin... | 21239X | 2071.0 | -35.981453 |
| 2023Q2 | Non-ferrous metal (except aluminum) production... | 3314 | 2232.0 | -35.862069 |
| 2023Q2 | Electric lighting equipment manufacturing | 3351 | 488.0 | -32.782369 |
| 2023Q2 | Automotive equipment rental and leasing | 5321 | 2974.0 | -31.221092 |
| 2023Q2 | Printing and related support activities | 323 | 3289.0 | -31.134841 |
| 2023Q2 | Copper, nickel, lead and zinc ore mining | 21223 | 4571.0 | -29.892638 |
| 2023Q2 | Other furniture-related product manufacturing | 3379 | 258.0 | -29.508197 |
| 2023Q2 | Motor vehicle manufacturing | 3361 | 4961.0 | -27.449547 |
| 2023Q2 | Hardware manufacturing | 3325 | 522.0 | -27.094972 |
plot_line_graph(df = gdp_clean, industry_codes = worst_industries_gdp_list, y_col = 'cumulative_growth', title = ''.join(['Industries with the lowest cumulative growth since ', date_range[0]]), y_axis_title = 'Cumulative growth (%)')
🧐 It looks like "Management of companies and enterprises" has been the industry with the lowest growth in Canada since 2019Q2.
👇 In fact, this industry has been steadily declining since at least 2019Q2!
plot_line_industries(df = gdp_clean, industry_names = ["All industries", "Management of companies and enterprises"])
👇 Interestingly, if we look at historical data we can see that the "Data processing, hosting, and related services" industry started gaining a lot of momentum at a similar time when "Management of companies and enterprises" started to decline.
🧐 This may represent a shift in the Canadian economy to a more tech and data-focused around 2015.
data_processing = gdp_grouped.xs('Data processing, hosting, and related services', level=1).dropna()
management = gdp_grouped.xs('Management of companies and enterprises', level = 1).dropna()
data_processing_management = pd.concat([data_processing, management], keys = ['Data processing, hosting, and related services', 'Management of companies and enterprises']).reset_index(names = ['industry', 'date', 'industry_code'])
data_processing_management['date'] = data_processing_management['date'].astype('str')
sns.relplot(data_processing_management, x = 'date', y = 'value', kind = 'line', hue = 'industry')
plt.xticks(data_processing_management[data_processing_management['date'].str.contains('Q2')]['date'].unique().tolist())
plt.xticks(rotation=90)
plt.axvline('2015Q2', color='black', linestyle='--')
# Set labels and title
plt.xlabel('Date')
plt.ylabel('GDP value per industry in CAD dollars (x 1,000,000)')
plt.show()
👉 The COVID-19 pandemic hit the economy the hardest in 2020Q2. What were the industries that had the best and worst performance during this period of time?
📈 Let's create a function to visualize the types of industries most affected during the pandemic:
def plot_bar_plot(df, title, xaxis_title):
"""
This function will plot a bar graph based on the dataframe provided.
Parameters
----------
df : Panda's DataFrame
DataFrame to plot data from
title : str
Plot title
Returns
-------
plotly plot
"""
fig = px.bar(df, x = 'cumulative_growth', y = 'industry', color='industry', color_discrete_sequence=px.colors.qualitative.Dark24, template='plotly_white')
fig.update_layout(
showlegend=False,
title_text=title,
yaxis_title='Industry', # Set the x-axis label
xaxis_title=xaxis_title)
fig.show()
best_industries_gdp_2020 = gdp_clean.xs('2020Q2').nlargest(n = 10, columns = 'cumulative_growth')
best_industries_gdp_2020
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2020Q2 | Non-store retailers | 454 | 5748.0 | 37.544867 |
| 2020Q2 | Wineries and distilleries | 3121A | 1237.0 | 29.800630 |
| 2020Q2 | Converted paper product manufacturing | 3222 | 3200.0 | 12.478032 |
| 2020Q2 | Couriers and messengers | 492 | 3837.0 | 10.960093 |
| 2020Q2 | Iron ore mining | 21221 | 7541.0 | 9.084334 |
| 2020Q2 | Banking, monetary authorities and other deposi... | 52BX | 71491.0 | 8.726598 |
| 2020Q2 | Depository credit intermediation and monetary ... | 52B | 76358.0 | 8.387628 |
| 2020Q2 | Postal service and couriers and messengers | 49A | 7387.0 | 8.091893 |
| 2020Q2 | Data processing, hosting, and related services | 518 | 4435.0 | 7.776428 |
| 2020Q2 | Crop production | 111 | 30310.0 | 6.721594 |
plot_bar_plot(df = best_industries_gdp_2020, title = 'Industries with the best performance during the first wave of the COVID-19 pandemic (2020Q2)', xaxis_title = 'GDP growth (%) in 2020Q2')
☝️ Not surprisingly, the best performing industry during the height of the COVID-19 pandemic was "Non-store retailers", which include online shops.
worst_industries_gdp_2020 = gdp_clean.xs('2020Q2').nsmallest(n = 10, columns = 'cumulative_growth')
worst_industries_gdp_2020
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2020Q2 | Air transportation | 481 | 216.0 | -97.691074 |
| 2020Q2 | Urban transit systems | 4851 | 813.0 | -86.508463 |
| 2020Q2 | Motor vehicle manufacturing | 3361 | 1342.0 | -80.374378 |
| 2020Q2 | Motor vehicles and parts manufacturing | 336Y | 4808.0 | -72.183975 |
| 2020Q2 | Motor vehicle parts manufacturing | 3363 | 2732.0 | -70.836892 |
| 2020Q2 | Accommodation services | 721 | 4134.0 | -69.600706 |
| 2020Q2 | Transit, ground passenger, scenic and sightsee... | 48Z | 3100.0 | -67.495019 |
| 2020Q2 | Hardware manufacturing | 3325 | 241.0 | -66.340782 |
| 2020Q2 | Gambling industries | 7132 | 1061.0 | -62.429178 |
| 2020Q2 | Performing arts, spectator sports and related ... | 71A | 2973.0 | -61.855273 |
plot_bar_plot(df = worst_industries_gdp_2020, title = 'Industries with the worst performance during the firts wave of the COVID-19 pandemic (2020Q2)', xaxis_title = 'GDP growth (%) in 2020Q2')
☝️ Also not surprisingly, the worst performing industries during 2020Q2 were related to transporation, travel, and leasure.
👉 A lot of industries have struggled to recover from the COVID-19 pandemic, whereas others that initially performed well have declined since the tech boom in 2020.
👇 Let's take a look at how the industries with the best and worst performance in 2020 have performed since the end of the COVID-19 pandemic:
best_and_worst_industries_gdp_2020 = pd.concat([best_industries_gdp_2020, worst_industries_gdp_2020], keys = ['best', 'worst']).reset_index(names=['category', 'date'])[['industry', 'industry_code', 'category']]
best_and_worst_industries_gdp = gdp_clean.reset_index(names='date').merge(best_and_worst_industries_gdp_2020, how='inner', on=['industry', 'industry_code']).sort_values(['date', 'category'])
#best_and_worst_industries_gdp = best_and_worst_industries_gdp.groupby(['category', 'date'])['cumulative_growth'].aggregate(['mean', 'std'])
best_and_worst_industries_gdp
| date | industry | industry_code | value | cumulative_growth | category | |
|---|---|---|---|---|---|---|
| 34 | 2019Q2 | Banking, monetary authorities and other deposi... | 52BX | 65753.0 | 0.000000 | best |
| 51 | 2019Q2 | Converted paper product manufacturing | 3222 | 2845.0 | 0.000000 | best |
| 68 | 2019Q2 | Couriers and messengers | 492 | 3458.0 | 0.000000 | best |
| 85 | 2019Q2 | Crop production | 111 | 28401.0 | 0.000000 | best |
| 102 | 2019Q2 | Data processing, hosting, and related services | 518 | 4115.0 | 0.000000 | best |
| ... | ... | ... | ... | ... | ... | ... |
| 220 | 2023Q2 | Motor vehicle parts manufacturing | 3363 | 10474.0 | 11.806149 | worst |
| 237 | 2023Q2 | Motor vehicles and parts manufacturing | 336Y | 16689.0 | -3.448076 | worst |
| 271 | 2023Q2 | Performing arts, spectator sports and related ... | 71A | 6417.0 | -17.667436 | worst |
| 305 | 2023Q2 | Transit, ground passenger, scenic and sightsee... | 48Z | 7426.0 | -22.134843 | worst |
| 322 | 2023Q2 | Urban transit systems | 4851 | 4404.0 | -26.916694 | worst |
340 rows × 6 columns
best_and_worst_industries_gdp_2020_list = best_and_worst_industries_gdp_2020['industry_code'].tolist()
plot_line_graph(df = gdp_clean, industry_codes = best_and_worst_industries_gdp_2020_list, y_col = 'cumulative_growth', title = 'Industries with the lowest cumulative growth in 2020Q2', y_axis_title = 'Cumulative growth (%)')
Overall we can see the following trends:
🧐 One way to test if the growth of the industries impacted in 2020 has recovered after the end of the COVID-19 pandemic is to run a statistical comparison between each category and the average across all industries in each time point.
👉 We have two hypothesis: either the industries show a similar cumulative growth to the average across all industries (null-hypothesis) or they show a significantly different cumulative growth (alternative hypothesis) over time.
✅ First we have to prepare our dataset. We have to separate industries into three categories: those that performed the best, the worst, and the average across industries in 2020Q2.
#The first quarter in our comparison, which is 2020Q2
start_quarter = pd.PeriodIndex([date_range[0]], freq='Q').to_timestamp()
#Gather industries that are not within the best nor the worst datasets
cat_industries_gdp = gdp_clean.query('industry not in @best_and_worst_industries_gdp["industry"]').query('industry not in ["All industries"]').reset_index('date')
cat_industries_gdp['category'] = 'average'
#Then concatenate the average, best, and worst industries together
cat_industries_gdp = pd.concat([cat_industries_gdp, best_and_worst_industries_gdp]).reset_index(drop = True)
cat_industries_gdp['category'] = cat_industries_gdp['category'].astype('category')
cat_industries_gdp
| date | industry | industry_code | value | cumulative_growth | category | |
|---|---|---|---|---|---|---|
| 0 | 2019Q2 | Aboriginal public administration | 914 | 4517.0 | 0.000000 | average |
| 1 | 2019Q2 | Accommodation and food services | 72 | 43887.0 | 0.000000 | average |
| 2 | 2019Q2 | Accounting, tax preparation, bookkeeping and p... | 5412 | 12954.0 | 0.000000 | average |
| 3 | 2019Q2 | Activities related to credit intermediation | 5223 | 3187.0 | 0.000000 | average |
| 4 | 2019Q2 | Administrative and support services | 561 | 45215.0 | 0.000000 | average |
| ... | ... | ... | ... | ... | ... | ... |
| 4619 | 2023Q2 | Motor vehicle parts manufacturing | 3363 | 10474.0 | 11.806149 | worst |
| 4620 | 2023Q2 | Motor vehicles and parts manufacturing | 336Y | 16689.0 | -3.448076 | worst |
| 4621 | 2023Q2 | Performing arts, spectator sports and related ... | 71A | 6417.0 | -17.667436 | worst |
| 4622 | 2023Q2 | Transit, ground passenger, scenic and sightsee... | 48Z | 7426.0 | -22.134843 | worst |
| 4623 | 2023Q2 | Urban transit systems | 4851 | 4404.0 | -26.916694 | worst |
4624 rows × 6 columns
👇 Let's run multiple t-tests and verify if our hypothesis is rejected (p<0.05):
### Function to define significance level
def p_value_signif(x):
if x < 0.001:
return '***'
elif x < 0.01:
return '**'
elif x < 0.05:
return '*'
else:
return 'n.s'
### Function to calculate t-test
def t_test_quarters(df, date_column, group_column, groups, control, value_column):
# Initialize lists to store values
p_values = []
comparison = []
quarters_list = []
# Unique quarters
quarters = df[date_column].unique()
# Length of comparison groups
len_groups = len(groups)
# Perform t-test for each quarter
for quarter in quarters:
df_quarter = df[df[date_column] == quarter]
# Perform comparison between each group and control
for i in range(len_groups):
t_stat, p_value = stats.ttest_ind(df_quarter[df_quarter[group_column] == groups[i]][value_column],
df_quarter[df_quarter[group_column] == control][value_column], equal_var=False)
p_values.append(p_value)
quarters_list.append(quarter)
comparison.append('-'.join([groups[i], control]))
# Perform multiple test correction
reject, pvals_corrected, _, _ = multipletests(p_values, method='holm')
# Create DataFrame with quarters, comparison, and p-values
quarters = quarters * 3
p_values_df = pd.DataFrame({
'quarter': quarters_list,
'comparison' : comparison,
'p_value': pvals_corrected
})
# Apply significance level function
p_values_df['p_value_signif'] = p_values_df['p_value'].apply(p_value_signif)
return p_values_df
cat_industries_gdp_ttest = t_test_quarters(df=cat_industries_gdp, date_column='date', group_column='category', groups = ['best', 'worst'], control = 'average', value_column='cumulative_growth')
cat_industries_gdp_ttest
| quarter | comparison | p_value | p_value_signif | |
|---|---|---|---|---|
| 0 | 2019Q2 | best-average | NaN | n.s |
| 1 | 2019Q2 | worst-average | NaN | n.s |
| 2 | 2019Q3 | best-average | 2.917339e-01 | n.s |
| 3 | 2019Q3 | worst-average | 1.000000e+00 | n.s |
| 4 | 2019Q4 | best-average | 1.638482e-01 | n.s |
| 5 | 2019Q4 | worst-average | 1.000000e+00 | n.s |
| 6 | 2020Q1 | best-average | 4.358711e-02 | * |
| 7 | 2020Q1 | worst-average | 7.997261e-01 | n.s |
| 8 | 2020Q2 | best-average | 1.327263e-04 | *** |
| 9 | 2020Q2 | worst-average | 7.713381e-07 | *** |
| 10 | 2020Q3 | best-average | 8.187044e-03 | ** |
| 11 | 2020Q3 | worst-average | 6.260518e-02 | n.s |
| 12 | 2020Q4 | best-average | 3.447360e-03 | ** |
| 13 | 2020Q4 | worst-average | 3.667422e-02 | * |
| 14 | 2021Q1 | best-average | 1.063311e-02 | * |
| 15 | 2021Q1 | worst-average | 1.055122e-02 | * |
| 16 | 2021Q2 | best-average | 4.755808e-02 | * |
| 17 | 2021Q2 | worst-average | 2.547329e-03 | ** |
| 18 | 2021Q3 | best-average | 6.260518e-02 | n.s |
| 19 | 2021Q3 | worst-average | 2.770224e-03 | ** |
| 20 | 2021Q4 | best-average | 6.260518e-02 | n.s |
| 21 | 2021Q4 | worst-average | 3.976040e-03 | ** |
| 22 | 2022Q1 | best-average | 6.260518e-02 | n.s |
| 23 | 2022Q1 | worst-average | 2.505858e-03 | ** |
| 24 | 2022Q2 | best-average | 6.260518e-02 | n.s |
| 25 | 2022Q2 | worst-average | 7.867950e-03 | ** |
| 26 | 2022Q3 | best-average | 8.845680e-02 | n.s |
| 27 | 2022Q3 | worst-average | 5.509357e-03 | ** |
| 28 | 2022Q4 | best-average | 6.829864e-02 | n.s |
| 29 | 2022Q4 | worst-average | 1.164943e-02 | * |
| 30 | 2023Q1 | best-average | 1.528044e-01 | n.s |
| 31 | 2023Q1 | worst-average | 3.619622e-02 | * |
| 32 | 2023Q2 | best-average | 1.836634e-01 | n.s |
| 33 | 2023Q2 | worst-average | 1.202744e-01 | n.s |
👇 Great, let's visualize how the cumulative growth of each group has progressed over time and how they deviate from the industry average:
sns.lineplot(data = cat_industries_gdp, x = 'date', y = 'cumulative_growth', hue = 'category', errorbar='sd')
plt.xticks(rotation=90)
plt.title('Recovery of industries based on their performace in 2020Q2')
plt.axhline(0, color='black', linestyle='--')
## Add p-value at each timepoint.
##Green represents the best industries vs. average. Orange represents the worst industries vs. average.
for i in range(len(cat_industries_gdp_ttest['quarter'].unique())):
plt.text(cat_industries_gdp_ttest.query('comparison == "best-average"')['quarter'].iloc[i], 50, cat_industries_gdp_ttest.query('comparison == "best-average"')['p_value_signif'].iloc[i], rotation=90, color = 'darkorange', horizontalalignment = 'center')
plt.text(cat_industries_gdp_ttest.query('comparison == "worst-average"')['quarter'].iloc[i], -95, cat_industries_gdp_ttest.query('comparison == "worst-average"')['p_value_signif'].iloc[i], rotation=90, color = 'darkgreen', horizontalalignment = 'center')
plt.legend(loc = 'upper left')
plt.xlabel('Date')
plt.ylabel('Cumulative growth since '+date_range[0]+' (%)')
plt.ylim(-100, 100)
plt.show()
✅ We can clearly see that, after 2020Q2, the industries in the best/worst performance have significantly deviated from the industry average, as expected.
🧐 Interestingly, whereas the growth of the best performing industries have slowly fallen back to the industry average, the growth of the worst performing industries were still struggling until early 2023.
👉 This indicates that the industries most affected by the COVID-19 pandemic have not been able to fully recover just yet, although the trend seems to be getting better in 2023.
📈 To verify how employment patterns have changed in the last couple of years, we will be using the employment and weekly income by industry dataset from Statistics Canada released on 2023-06-29: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410022002
employment_earnings = pd.read_csv('/Users/dudi/Dropbox/UofToronto/Project/Portfolio/covid_recovery/data/14100220_employment_earnings_by_industry_weekly.csv', usecols = ['REF_DATE', 'Estimate', 'North American Industry Classification System (NAICS)', 'VALUE'])
🧐 We chose the following columns from this dataset:
1. REF_DATE: The month in which the data was collected;
2. Estimate: The type of data, either "Employment for all employees" or "Average weekly earnings including overtime for all employees".
3. North American Industry Classification System (NAICS): The industry name and NAICS code from which the value was obtained from;
4. VALUE: The GDP measure in Canadian dollars (x1,000,000).
employment_earnings.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 231904 entries, 0 to 231903 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 REF_DATE 231904 non-null object 1 Estimate 231904 non-null object 2 North American Industry Classification System (NAICS) 231904 non-null object 3 VALUE 214765 non-null float64 dtypes: float64(1), object(3) memory usage: 7.1+ MB
👉 Once again, we can optimize our dataset by changing the REF_DATE datatype to date/time and NAICS to category.
employment_earnings['REF_DATE'] = pd.to_datetime(employment_earnings['REF_DATE'])
employment_earnings = employment_earnings.astype({'Estimate' : 'category',
'North American Industry Classification System (NAICS)': 'category'})
employment_earnings.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 231904 entries, 0 to 231903 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 REF_DATE 231904 non-null datetime64[ns] 1 Estimate 231904 non-null category 2 North American Industry Classification System (NAICS) 231904 non-null category 3 VALUE 214765 non-null float64 dtypes: category(2), datetime64[ns](1), float64(1) memory usage: 4.2 MB
👉 Similarly to what we have done before, let's rename our columns:
👉 Let's also separate the industry code from the industry name and assign it to 'industry_code'.
employment_earnings = employment_earnings.rename(columns = {'REF_DATE' : 'date',
'North American Industry Classification System (NAICS)': 'industry',
'VALUE' : 'value',
'Estimate' : 'estimate'}).set_index('date')[['industry', 'estimate', 'value']].copy()
new_df = employment_earnings['industry'].str.split('[', expand=True)
employment_earnings['industry'] = new_df[0].str.strip()
employment_earnings['industry_code'] = new_df[1].str.replace(']', '').str.strip()
employment_earnings = employment_earnings[['industry', 'industry_code', 'estimate', 'value']]
employment_earnings = employment_earnings[~employment_earnings['industry'].str.contains('|'.join(['Cannabis', 'cannabis']))].copy()
employment_earnings['estimate'].unique()
['Employment for all employees', 'Average weekly earnings including overtime fo...] Categories (2, object): ['Average weekly earnings including overtime fo..., 'Employment for all employees']
We have two types of data in this dataset:
Employment number for all employess per each industry
Average weekly earning (including overtime) per each industry
✅ Let's separate them into two distinct datasets: employment_df and income_df
employment_df = employment_earnings.query('estimate == "Employment for all employees"').drop('estimate', axis=1)
income_df = employment_earnings.query('estimate == "Average weekly earnings including overtime for all employees"').drop('estimate', axis=1)
employment_df
| industry | industry_code | value | |
|---|---|---|---|
| date | |||
| 2001-01-01 | Industrial aggregate including unclassified bu... | 00-91N | 12914390.0 |
| 2001-01-01 | Industrial aggregate excluding unclassified bu... | 11-91N | 12835195.0 |
| 2001-01-01 | Goods producing industries | 11-33N | 2910904.0 |
| 2001-01-01 | Forestry, logging and support | 11N | 70073.0 |
| 2001-01-01 | Forestry and logging | 113 | 50650.0 |
| ... | ... | ... | ... |
| 2023-04-01 | Federal government public administration | 911 | 262969.0 |
| 2023-04-01 | Provincial and territorial public administration | 912 | 340516.0 |
| 2023-04-01 | Local, municipal and regional public administr... | 913 | 482325.0 |
| 2023-04-01 | Indigenous public administration | 914 | 69496.0 |
| 2023-04-01 | Unclassified businesses | 00 | 338373.0 |
115416 rows × 3 columns
📈 Let's take an overall look at the historical employment data from early 2000's to 2023
# Create a line plot using Seaborn
sns.lineplot(data = employment_df.query('industry == "Industrial aggregate excluding unclassified businesses"'), x = 'date', y = 'value')
# Highlight a section using axvspan() or axhspan()
plt.axvspan('2020-01-01', '2022-01-01', color='red', alpha=0.2)
# Set labels and title
plt.xlabel('Date')
plt.ylabel('Employment for all industries (x 10,000,000)')
# Show the plot
plt.show()
☝️ We can again observe the impact of the COVID-19 pandemic on the employment numbers in the chart above.
🧐 How has employment numbers recovered for different industries since the end of the pandemic?
👉 Let's first aggregate our data by quarterly periods:
employment_grouped = employment_df.groupby([employment_df.index.to_period('Q'), 'industry', 'industry_code']).median()
employment_grouped
| value | |||
|---|---|---|---|
| date | industry | industry_code | |
| 2001Q1 | Accommodation and food services | 72 | 947156.0 |
| Accommodation services | 721 | 186626.0 | |
| Accounting, tax preparation, bookkeeping and payroll services | 5412 | 65452.0 | |
| Activities related to credit intermediation | 5223 | 12161.0 | |
| Activities related to real estate | 5313 | 35544.0 | |
| ... | ... | ... | ... |
| 2023Q2 | Web search portals, libraries and archives, and all other information services | 5192 | 26583.0 |
| Web search portals, libraries, archives, and all other information services | 519 | 26583.0 | |
| Wholesale trade | 41 | 820614.0 | |
| Wired and wireless telecommunications carriers (except satellite) | 5173 | 100164.0 | |
| Wood product manufacturing | 321 | 94213.0 |
38736 rows × 1 columns
👇 We can use the filter_datarange function from the previous section to filter the dataset and eliminate any time periods and industries with NA's
employment_clean = filter_datarange(df = employment_grouped, date_range = date_range, index_columns = ['industry', 'industry_code'], date_column = 'date', value_column = 'value')
employment_clean
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2019Q2 | Accommodation and food services | 72 | 1335643.0 | 0.000000 |
| 2019Q2 | Accommodation services | 721 | 203289.0 | 0.000000 |
| 2019Q2 | Accounting, tax preparation, bookkeeping and p... | 5412 | 112364.0 | 0.000000 |
| 2019Q2 | Activities related to credit intermediation | 5223 | 28614.0 | 0.000000 |
| 2019Q2 | Activities related to real estate | 5313 | 61290.0 | 0.000000 |
| ... | ... | ... | ... | ... |
| 2023Q2 | Web search portals, libraries and archives, an... | 5192 | 26583.0 | 22.282534 |
| 2023Q2 | Web search portals, libraries, archives, and a... | 519 | 26583.0 | 22.282534 |
| 2023Q2 | Wholesale trade | 41 | 820614.0 | -0.127911 |
| 2023Q2 | Wired and wireless telecommunications carriers... | 5173 | 100164.0 | -4.625697 |
| 2023Q2 | Wood product manufacturing | 321 | 94213.0 | -1.838963 |
6341 rows × 4 columns
🧐 Let's check which industries have accumulated the highest employment growth since 2019Q2
best_industries_employment = employment_clean.xs(date_range[1]).nlargest(n = 10, columns = 'cumulative_growth')
best_industries_employment
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2023Q2 | Resin, synthetic rubber, and artificial and sy... | 3252 | 7521.0 | 68.405732 |
| 2023Q2 | Support activities for rail transportation | 4882 | 6375.0 | 64.516129 |
| 2023Q2 | Computing infrastructure providers, data proce... | 518,5182 | 31807.0 | 57.538385 |
| 2023Q2 | Mineral, ore and precious metal merchant whole... | 4185 | 835.0 | 55.204461 |
| 2023Q2 | Computer systems design and related services | 5415 | 387582.0 | 50.284414 |
| 2023Q2 | Local messengers and local delivery | 4922 | 12111.0 | 49.334155 |
| 2023Q2 | Community food and housing, and emergency and ... | 6242 | 23966.0 | 47.093844 |
| 2023Q2 | Household appliance manufacturing | 3352 | 2542.0 | 44.267877 |
| 2023Q2 | Scientific research and development services | 5417 | 73553.0 | 41.854545 |
| 2023Q2 | Railroad rolling stock manufacturing | 3365 | 6157.0 | 41.280404 |
🧐🧐 And which one of these industries have also accumulated the highest growth since 2019Q2?
plot_line_graph(df = employment_clean, industry_codes = best_industries_gdp_list, y_col = 'cumulative_growth', title = ''.join(['Employment within industries with the highest cumulative growth since ', date_range[0]]), y_axis_title = 'Cumulative growth (%)')
📈 Nice! We can clearly see, once again, that industries related to computer system designs and related services have had an enourmous growth since 2019!
👌 So far we have seen that data- and computer-related industries have grown massively since the COVID-19 pandemic.
💰 Have salaries followed suit? Let's take a look!
income_df
| industry | industry_code | value | |
|---|---|---|---|
| date | |||
| 2001-01-01 | Industrial aggregate including unclassified bu... | 00-91N | NaN |
| 2001-01-01 | Industrial aggregate excluding unclassified bu... | 11-91N | 657.36 |
| 2001-01-01 | Goods producing industries | 11-33N | 819.81 |
| 2001-01-01 | Forestry, logging and support | 11N | 769.99 |
| 2001-01-01 | Forestry and logging | 113 | 776.20 |
| ... | ... | ... | ... |
| 2023-04-01 | Federal government public administration | 911 | 1897.02 |
| 2023-04-01 | Provincial and territorial public administration | 912 | 1608.10 |
| 2023-04-01 | Local, municipal and regional public administr... | 913 | 1358.40 |
| 2023-04-01 | Indigenous public administration | 914 | 987.87 |
| 2023-04-01 | Unclassified businesses | 00 | NaN |
115416 rows × 3 columns
👇 First let's take a look at the average weekly income across all industries (in CAD dollars):
# Create a line plot using Seaborn
sns.lineplot(data = income_df.query('industry == "Industrial aggregate excluding unclassified businesses"'), x = 'date', y = 'value')
# Highlight a section using axvspan() or axhspan()
plt.axvspan('2020-01-01', '2022-01-01', color='red', alpha=0.2)
# Set labels and title
plt.xlabel('Date')
plt.ylabel('Average weekly income across all industries (CAD $)')
# Show the plot
plt.show()
📈 Interestingly, the average weekly salary across all industries has gone up during the COVID-19 pandemic.
🧐 Which industries have grown their weekly salary the most?
#Calculate the median in each quarter
income_grouped = income_df.groupby([income_df.index.to_period('Q'), 'industry', 'industry_code']).median()
#Filter results between 2019Q2 and 2023Q2
income_clean = filter_datarange(df = income_grouped, date_range = date_range, index_columns = ['industry', 'industry_code'], date_column = 'date', value_column = 'value')
#Gather the top 10 industries with
best_industries_income = income_clean.xs(date_range[1]).nlargest(n = 10, columns = 'value').sort_values('cumulative_growth', ascending=False)
best_industries_income
| industry | industry_code | value | cumulative_growth | |
|---|---|---|---|---|
| date | ||||
| 2023Q2 | Steel product manufacturing from purchased steel | 3312 | 2235.86 | 65.521173 |
| 2023Q2 | Alumina and aluminum production and processing | 3313 | 2153.54 | 52.159229 |
| 2023Q2 | Non-ferrous metal (except aluminum) production... | 3314 | 2426.11 | 35.451363 |
| 2023Q2 | Computing infrastructure providers, data proce... | 518,5182 | 2216.21 | 26.474348 |
| 2023Q2 | Software publishers | 5132 | 2269.17 | 19.054040 |
| 2023Q2 | Electric power generation, transmission and di... | 2211 | 2197.34 | 13.580515 |
| 2023Q2 | Mining, quarrying, and oil and gas extraction | 21 | 2332.81 | 8.296272 |
| 2023Q2 | Oil and gas extraction | 211,2111 | 2807.41 | 3.847377 |
| 2023Q2 | Mining and quarrying (except oil and gas) | 212 | 2193.31 | 3.002282 |
| 2023Q2 | Metal ore mining | 2122 | 2285.84 | 0.999907 |
best_industries_income_list = best_industries_income['industry_code'].to_list()
plot_line_graph(df = income_clean, industry_codes = best_industries_income_list, y_col = 'cumulative_growth', title = ''.join(['Industries with the highest cumulative growth in weekly income since ', date_range[0]]), y_axis_title = 'Cumulative growth (%)')
✅ Following a similar trend as before, we see that the "Computing infrastructure providers, data processing, web hosting, and related services" industry is among the top 10 industries with the highest weekly salary gains since 2019Q2.
income_subset_list = ["Computing infrastructure providers, data processing, web hosting, and related services", "Industrial aggregate excluding unclassified businesses"]
income_subset = income_grouped.query('industry in @income_subset_list').reset_index(col_level = 0)
income_subset['date'] = income_subset['date'].astype('str')
sns.relplot(income_subset, x = 'date', y = 'value', kind = 'line', hue = 'industry')
plt.xticks(income_subset[income_subset['date'].str.contains('Q2')]['date'].unique().tolist())
plt.axvline('2020Q2', color='black', linestyle='--')
plt.xticks(rotation=90)
# Set labels and title
plt.xlabel('Date')
plt.ylabel('Average weekly income ($CAD)')
plt.show()
🤓 In fact, we can see the massive jump in average weekly income for "Computing infrastructure providers, data processing, web hosting, and related services" starting in 2020Q2!
👉 Different industries have shown distinct growth trajectories in the aftermath of the COVID-19 pandemic.
👉 Online and data-related companies have notably outperformed other sectors since 2020, marking a shift towards digital reliance in the economy.
👉 Traditional sectors such as manufacturing, transport, and leisure bore significant negative impacts from the pandemic in 2020.
👉 Despite their initial robust growth during the pandemic, online and data-centric industries are experiencing a slowdown in their growth rate when compared to the broader industry average.
👉 Conversely, the worst-hit sectors during the pandemic, including manufacturing, transport, and leisure, are still struggling to achieve full recovery.
👉 The computer-related industry has seen a significant surge in job growth since 2020, signifying an increasing demand for tech skills in the economy.
👉 Reflecting a similar trend, the computer and data-related sectors have demonstrated considerable salary growth in recent years, underscoring their rising importance in the economy.